Python 随机提取Excel中部分数据并输出为新表格

Neumann.N21

已于 2022-11-07 15:07:35 修改

阅读量3.6k

点赞数 5

于 2022-06-15 10:40:01 首次发布

本文链接：https://blog.csdn.net/Nuemann_N21/article/details/125292493

版权

python 处理 excel 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

将60000行数据的excel提取其中10%作为模型测试集，并将提取剩下的数据输出为另一个文件

import openpyxl
import random
from openpyxl import load_workbook

#PATH #路径名
INPUT_FILES_BASE_PATH ="D:\\Users\\User\\Desktop\\"
EXCEL_FILENAME = 'name.xlsx'

def get_row_value(ws,row):
    col_num = ws.max_colum
    row_data = []
    for i in range(1,col_num+1):
        cell_value = ws.cell(row=row, column=i).value
        row_data.append(cell_value)
    return row_data

for i in range(1):
    
    #input #输入
    wb = load_workbook(r"{}".format(INPUT_FILES_BASE_PATH) + "{}".format(EXCEL_FILENAME))
    sheet = wb.active
    row_num = sheet.max_row
    #Take 10% of the sample at random, leaving the first row of table heads alone.
    # Variable function to achieve different random extraction rates
    # 随机抽取10%样本，第一行表头不取。可更改函数实现不同随机抽取率
    random_num = random.sample(range(2,row_num+1),row_num//10)

    #Write into new form #写入表格
    
    #RandomExtract part #随即提取的部分
    wb2 = openpyxl.Workbook()
    sheet2 = wb2.active
    sheet2.append(get_row_value(sheet,1))
    for j in random_num:
        sheet2.append(get_row_value(sheet,j))
   
    #sheet2.append(['The random number generated is：'] + random_num)
    # If a random number needs to be generated, it is printed to the last row of the table
    #sheet2.append(['生成的随机数为：'] + random_num) 若需要生成的随机数，则会输出到表格最后一行

    #output #输出
    out_file_name1 = 'RandomExtract.xlsx'
    wb2.save(out_file_name1)

    # Remainder #剩下的部分
    wb3 = openpyxl.Workbook()
    sheet3 = wb3.active
    sheet3.append(get_row_value(sheet, 1))
    for m in range(2,row_num+1):
        if m not in random_num:
            sheet3.append(get_row_value(sheet, m))

    # output #输出
    out_file_name2 = 'RandomRemain.xlsx'
    wb3.save(out_file_name2)

    print('Success extract')

***************************************************************************

2022-11-07补充：

发现有的朋友会运行时候遇到下面问题：

AttributeError: 'Worksheet' object has no attribute 'max_colum'

这个报错可能有一下几个原因：

1、我的openpyxl版本号是3.0.9，版本不同的话可能函数名不一样，或者有的版本干脆删除了这个函数。所以检查你的版本号或者直接安装我这个版本会解决问题

2、 excel文件读取保存后缀要是.xlsx，如果是.xls可能会报错

3、未安装Xlsxwriter，可参考：https://www.pythonheidong.com/blog/article/505502/0419a71691120e53ac8f/ （我没遇到过这种情况）